What if I only have one sample?
Approximate the variability you’d expect to see in other samples!
Bootstrapping!
A Bootstrap Resample
We can use the statistics from these bootstrap samples to approximate the true sampling distribution!
Why???
Estimating a population parameter
Confidence Intervals
Capture a range of plausible values for the population parameter.
Are more likely to capture the population parameter than a point estimate.
Using bootstrap resamples to generate a confidence interval
From your original sample, resample with replacement the same number of times as your original sample.
This is your bootstrap resample.
Repeat this process many, many times.
Calculate a numerical summary (e.g., mean, median) for each bootstrap resample.
These are your bootstrap statistics
Bootstrap Distribution
definition: a distribution of the bootstrap statistics from every bootstrap resample
Displays the variability in the statistic that could have happened with repeated sampling.
Approximates the true sampling distribution!
Confidence Interval
Goal: Capture a range of plausible values for the population parameter.
How do I get this plausible range of values?
Bootstrapping!
Penguins!
Statistic: \(\beta_1\)
The relationship between penguin’s bill length and body mass for all penguins in the Palmer Archipelago
Generating a bootstrap resample
Step 1: specify() your response and explanatory variables
Step 2: generate() bootstrap resamples
Step 3: calculate() the statistic of interest
Declare your variables!
Generate your resamples!
reps – the number of resamples you want to generate
"bootstrap" – the method that should be used to generate the new samples
Your turn!
Why do we resample with replacement when creating a bootstrap distribution?
When we resample with replacement from our original sample what are we assuming about our sample?
Calculate your statistics!
"slope" – the statistic of interest
The final product
What does one dot / point on a bootstrap distribution represent?
A plausible range of values for: \(\beta_1\)
The 95% confidence interval is…
| Lower Bound | Upper Bound |
|---|---|
| 0.00355 | 0.00453 |
What do we hope is captured by this interval?
How do we interpret this interval?
“We are 95% confident the slope of the relationship between bill length and body mass for all penguins in the Palmer Archipelago is between 0.00355 and 0.00453
What does it mean to be 95% confident?
Classic interpretation mistakes
“95% of the time the population parameter would fall between 0.00355 and 0.00453.”
“We are 95% confident the sample statistic is in our interval.”